Statistical Evaluation of Information Distillation Systems

نویسندگان

  • J. V. White
  • D. Hunter
  • J. D. Goldstein
چکیده

We describe a methodology for evaluating the statistical performance of information distillation systems and apply it to a simple illustrative example. (An information distiller provides written English responses to English queries based on automated searches/transcriptions/translations of English and foreign-language sources. The sources include written documents and sound tracks.) The evaluation methodology extracts information nuggets from the distiller response texts and gathers them into fuzzy equivalence classes called nugs. The methodology supports the usual performance metrics, such as recall and precision, as well as a new information-theoretic metric called proficiency, which measures how much information a distiller provides relative to all of the information provided by a collection of distillers working on a common query and corpora. Unlike previous evaluation techniques, the methodology evaluates the relevance, granularity, and redundancy of information nuggets explicitly. 1. Information Distillers An autonomous information distiller takeswritten queries as inputs, and in response, automatically gathers, transcribes, translates (if necessary), and distills relevant information frommultilingual text and speech sources. The distiller outputs the distilled information in a readable document written in the same language as the query. The distiller also identifies all of the source files that support each fact or assertion in the distilled information. Precise distillers produce concise clean output: they avoid presenting redundant,mistranscribed,mistranslated, or irrelevant information. Completely thorough distillers miss nothing: they report all of the relevant information in the corpora being queried. This paper discusses a methodology for statistically evaluating the information content of distiller responses. The handling of document citations, the usability, readability, and utility of the responses, and translation quality metrics are not discussed in this paper. Although our evaluationmethodologywasdeveloped to support the GALE (Global Autonomous Language Exploitation) program,1 2 it is also applicable to other evaluations that share similar objectives. The methodology is based on analyzing the nuggets of information contained in the distiller’s response. The nuggets may either be manually produced by annotators, as they are in the GALE program or as in the original Pyramid approach for evaluating summaries (Nenkova-Passonneau, 2003), or the nuggets may be automatically extracted, as in (Zhou-Hovy, 2007; ZhouHovy, 2006). However, even if nuggets are extracted automatically, the statistical methodology described here does require some manual annotation because annotators must 1This material is based upon work supported by the Defense Advanced Research Projects Agency DARPA/IPTO, Global Autonomous Language Exploitation, ARPA Order No. V018, Program Code No. 5M30, issued by DARPA/CMO under Contract #HR0011-06-C-003. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Defense Advanced Research Projects Agency or the U.S. Government. 2http://www.arpa.mil/ipto/programs/gale. assign relevance weights to the relatively precise and specific nuggets, and theymust assign degrees ofmembership to any relatively imprecise nuggets that partially overlap more specific nuggets. 2. Evaluation Objectives Aprimary objective ofGALE is to compare the performance of several automaticmachine distillers withmultilingual human distillers (who use only consumer software tools for word processing, reviewing audio, and file searching). Our evaluation methodology supports all of the standard statistical metrics for information extraction as well as several new ones, such as a citation weighted Fmetric, a rightness metric, and an information-theoretic proficiency metric, which are defined in Section 6. A distiller is allowed to produce any readable response text that is consistent with the sources; there are no significant structural constraints on the distiller output. A valid response may be a sequence of direct quotes from the sources, or it may include paraphrases or summaries of source material. The distiller is free to use any kind of wording in its responses, as long as the resulting response is readable and free of redundancy. Therefore, a major objective of the evaluation is to evaluate the information content of unstructured responses. If two distillers use completely different wording but provide exactly the same information with the same redundancies, then their responses should be evaluated as being equivalent. In practice, one response may be more readable than the other, but this issue is ignored in the present methodology. Our evaluation for GALE combines scoring of both query responses and document retrieval, but this paper deals only with the analysis of the query responses and does not address the evaluation of document citations. There have been a number of formal evaluations of document retrieval systems (TREC, 2005), and while these evaluations have some points of contact with the evaluation described in this paper, the more relevant comparison is with evaluations of query answering systems such as those done in the query answering track in TREC-2005

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Statistical Model for Evaluation Interactive Question Answering Systems Using Regression

The development of computer systems and extensive use of information technology in the everyday life of people have just made it more and more important for them to make quick access to information that has received great importance. Increasing the volume of information makes it difficult to manage or control. Thus, some instruments need to be provided to use this information. The QA system is ...

متن کامل

Fault diagnosis in a distillation column using a support vector machine based classifier

Fault diagnosis has always been an essential aspect of control system design. This is necessary due to the growing demand for increased performance and safety of industrial systems is discussed. Support vector machine classifier is a new technique based on statistical learning theory and is designed to reduce structural bias. Support vector machine classification in many applications in v...

متن کامل

ارزیابی سیستم اطلاعات بیمارستانی بیمارستان‌های منتخب شهر تهران 1390

Background and Aim: According to the objectives of the information systems and to avoid duplication and help to improve the quality of care and reduce costs, HIS ongoing evaluation should be conducted to achieve these goals. This study has evaluated hospital information systems in selected hospitals with the use of "integrated hospital information system evaluation criteria-2011". Materials an...

متن کامل

Evaluation of Document Citations in Phase 2 Gale Distillation

The focus of information retrieval evaluations, such as NIST’s TREC evaluations (e.g. Voorhees 2003), is on evaluation of the information content of system responses. On the other hand, retrieval tasks usually involve two different dimensions: reporting relevant information and providing sources of information, including corroborating evidence and alternative documents. Under the DARPA Global A...

متن کامل

Annotation of Nuggets and Relevance in GALE Distillation Evaluation

This paper presents an approach to annotation that BAE Systems has employed in the DARPA GALE Phase 2 Distillation evaluation. The purpose of the GALE Distillation evaluation is to quantify the amount of relevant and non-redundant information a distillation engine is able to produce in response to a specific, formatted query; and to compare that amount of information to the amount of informatio...

متن کامل

How to Evaluate Health Information Systems: Evaluation stages

The most important goal of health systems is improvement of quality, effectiveness and efficiency of health services. To achieve this goal, health care organizations should establish a proper structure for evaluating health information systems. Health information system evaluation is expected to identify the existing problems of the system through measuring specific indicators. The main objecti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008